Producing Public-use Microdata That Are Analytically Valid and Confidential
نویسنده
چکیده
A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same as the original, confidential file that is not distributed. If the microdata file contains a moderate number of variables and is required to meet a single set of analytic needs of, say, university researchers, then many more records are likely to be re-identified via modern record linkage methods than via the re-identification methods typically used in the confidentiality literature. This paper compares several masking methods in terms of their ability to produce analytically valid, confidential microdata.
منابع مشابه
Re-identification Methods for Evaluating the Confidentiality of Analytically Valid Microdata
Disclaimer: This report is released to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed are those of the author and not necessarily those of the U.S. Census Bureau. A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same...
متن کاملSeeking explanation in theory: Reflections on the social practices of organizations that distribute public use microdata files for research purposes
(2001). Seeking explanation in theory: Reflections on the social practices of organizations that distribute public use microdata files for research purposes. Public concern about personal privacy has recently fo-cused on issues of Internet data security and personal information as big business. The scientific discourse about information privacy focuses on the crosspres-sures of maintaining conf...
متن کاملNORC Data Enclave
Launched in 2006, the NORC Data Enclave provides a confidential, protected environment within which authorized researchers can access sensitive microdata remotely. While public-use data can be disseminated in a variety of ways, fewer options exist for sharing sensitive microdata that have not been fully de-identified for public use. Some data producers have sufficient economies of scale to deve...
متن کاملCombining synthetic data with subsampling to create public use microdata files for large scale surveys
To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants’ confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemi...
متن کاملReleasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information
T he ability to collect and disseminate individually identifiable microdata is becoming increasingly important in a number of arenas. This is especially true in health care and national security, where this data is considered vital for a number of public health and safety initiatives. In some cases legislation has been used to establish some standards for limiting the collection of and access t...
متن کامل